Signal Lexicon Higher - Level Linguistic

نویسندگان

  • Michael Phillips
  • James Glass
  • Victor Zue
چکیده

In 1989, our group first reported on the development of SUMMIT, a segment-based speaker-independent continuous-speech recognition system [13] . The initial version of SUMMIT made use of fairly simple context-independent models for the lexical labels. Recently, we have begun to incorporate more complex models of lexical labels that take into account a variety of contextual factors. These changes, along with an improved corrective training procedure for adapting pronunciation arc weights and a larger set of training data, have resulted in the reduction of error rate by almost a factor of two on the Resource Management task. I N T R O D U C T I O N Variability in speech arises from many different sources. For example, acoustic variability can be due to noise or chain nel characteristics, phonetic variability can be due to contextual or speaker-specific effects, and dialect effects can alter speakers' pronunciations of words. Speech recognition systems must have mechanisms to model these various types of variability, and sometimes it may be necessary to deal with different types of variability with different mechanisms. For example, it may be difficult to find a single model that is able to deal effectively with both low-level acoustic variability and dialect differences among speakers. find mechanisms that are able to account for many different types of contextual factors. In this paper, we will describe a number of experiments intended to address some of the problems mentioned above. So far, we have a t tempted to account for some of the contextual effects on our phonetic models, although the approach that we have taken should apply to the higher levels of the system also. Briefly, we have found that we can increase recognition performance by creating context-specific models or by using more flexible models. However, we did not see a performance increase when we combined the two in a straightforward manner, presumably due to the fact that more flexible models tend to require more training data. If, instead of using context-specific models, we accounted for context by adjusting the input to the phonetic models (creating a context-normalized input vector), we were able to account for contextual effects and were able to use more flexible phonetic models, resulting in the highest performance for our system. In the following sections, we will first provide an overview of the system. This will be followed by a more detailed description of the changes we have made to the system, and evaluation results on the Resource Management task. In the SUMMIT system, we have made a rough distinction between the sort of variability that we can deal with within our phonetic models (including acoustic variability and speaker differences at a phonetic level), and higher level phonological variation (including dialect effects and wordboundary effects). In both cases, our goal is to account for as much of the variability as possible, and it is clear that at least some of the variability is due to contextual effects. Just as there are many types of variability, there are many types of contextual effects, including local phonetic effects (coarticulation), effects of stress, phrase-level effects (such as prepausal lengthening), and higher level effects (such as sentential stress or dialect differences). Therefore, we need to 1This research was supported by DARPA under Contract N0001489-J-1332, monitored through the Office of Naval Research. S Y S T E M O V E R V I E W C o m p o n e n t D e s c r i p t i o n A block diagram of the SUMMIT system is shown in Figure 1. The acoustic processing consists of a model of the human peripheral auditory system as a front-end, a hierarchical segmentation algorithm to produce a network of possible acoustic segments, an automatically defined set of segmental measurements for each hypothesized segment, and finally, a statistical classifier for providing a probability of each label given a segment. The result of this analysis branch of the system is a network of possible phonetic interpretations of the speech signal. Each arc in the network has a list of probabilities of the labels used to represent the lexicon [13].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Metalinguistic English Vocabulary Knowledge and Lexical Inferencing on EFL Learners’ Lexical Knowledge Considering the Cross-Linguistic Issue of L1 Lexicalization

The present study endeavors to unravel the enigma of the psycholinguistic mechanisms underpinning bilingual mental lexicon by analyzing the issue of L1 lexicalization as a construct epitomizing an overarching framework. It involves 78 juniors at the Islamic Azad University, Roudehen Branch. The study inspects the impact of the interventionist/noninterventionist treatments on both sets of lexica...

متن کامل

Evaluation of Automatic Generation of Prosody with a Superposition Model

A new paradigm for modelling prosody is introduced. We assume that global melodic prototypes are built and stored in a "prosodic lexicon". The actual generation of adequate prosodic contours is achieved by retrieving and combining these elementary global contours accessed by linguistic keys. Two automatic F0 generation procedures have been used: The first consists of a structured lexicon, the s...

متن کامل

A Linguistic Analysis of Conference Titles in Applied Linguistics

Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...

متن کامل

A Linguistic Analysis of Conference Titles in Applied Linguistics

Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...

متن کامل

Code-Copying in the Balochi Language of Sistan

This empirical study deals with language contact phenomena in Sistan. Code-copying is viewed as a strategy of linguistic behavior when a dominated language acquires new elements in lexicon, phonology, morphology, syntax, pragmatic organization, etc., which can be interpreted as copies of a dominating language. In this framework Persian is regarded as the model code which provides elements for b...

متن کامل

Synthesising attitudes with global rhythmic and intonation contours

We present here a trainable generative model of French prosody. We focus on the sentence level and design SNNs able to generate both rhythmic and intonation contours for diverse attitudes. First results of a perceptual test show that listeners are able to retrieve the right definition of attitudes by listening to synthetic PSOLA stimuli. 1. THEORETICAL FRAMEWORK In our theoretical framework pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006